The paper discusses the potential of large vision-language models as objects of interest for empirical cultural studies. Focusing on the comparative analysis of outputs from two popular text-to-image synthesis models, DALL-E 2 and Stable Diffusion, the paper tries to tackle the pros and cons of striving towards culturally agnostic vs. culturally specific AI models. The paper discusses several examples of memorization and bias in generated outputs which showcase the trade-off between risk mitigation and cultural specificity, as well as the overall impossibility of developing culturally agnostic models.
translated by 谷歌翻译
There are multiple scales of abstraction from which we can describe the same image, depending on whether we are focusing on fine-grained details or a more global attribute of the image. In brain mapping, learning to automatically parse images to build representations of both small-scale features (e.g., the presence of cells or blood vessels) and global properties of an image (e.g., which brain region the image comes from) is a crucial and open challenge. However, most existing datasets and benchmarks for neuroanatomy consider only a single downstream task at a time. To bridge this gap, we introduce a new dataset, annotations, and multiple downstream tasks that provide diverse ways to readout information about brain structure and architecture from the same image. Our multi-task neuroimaging benchmark (MTNeuro) is built on volumetric, micrometer-resolution X-ray microtomography images spanning a large thalamocortical section of mouse brain, encompassing multiple cortical and subcortical regions. We generated a number of different prediction challenges and evaluated several supervised and self-supervised models for brain-region prediction and pixel-level semantic segmentation of microstructures. Our experiments not only highlight the rich heterogeneity of this dataset, but also provide insights into how self-supervised approaches can be used to learn representations that capture multiple attributes of a single image and perform well on a variety of downstream tasks. Datasets, code, and pre-trained baseline models are provided at: https://mtneuro.github.io/ .
translated by 谷歌翻译
The French National Institute of Geographical and Forest Information (IGN) has the mission to document and measure land-cover on French territory and provides referential geographical datasets, including high-resolution aerial images and topographic maps. The monitoring of land-cover plays a crucial role in land management and planning initiatives, which can have significant socio-economic and environmental impact. Together with remote sensing technologies, artificial intelligence (IA) promises to become a powerful tool in determining land-cover and its evolution. IGN is currently exploring the potential of IA in the production of high-resolution land cover maps. Notably, deep learning methods are employed to obtain a semantic segmentation of aerial images. However, territories as large as France imply heterogeneous contexts: variations in landscapes and image acquisition make it challenging to provide uniform, reliable and accurate results across all of France. The FLAIR-one dataset presented is part of the dataset currently used at IGN to establish the French national reference land cover map "Occupation du sol \`a grande \'echelle" (OCS- GE).
translated by 谷歌翻译
We launch EVA, a vision-centric foundation model to explore the limits of visual representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained to reconstruct the masked out image-text aligned vision features conditioned on visible image patches. Via this pretext task, we can efficiently scale up EVA to one billion parameters, and sets new records on a broad range of representative vision downstream tasks, such as image recognition, video action recognition, object detection, instance segmentation and semantic segmentation without heavy supervised training. Moreover, we observe quantitative changes in scaling EVA result in qualitative changes in transfer learning performance that are not present in other models. For instance, EVA takes a great leap in the challenging large vocabulary instance segmentation task: our model achieves almost the same state-of-the-art performance on LVISv1.0 dataset with over a thousand categories and COCO dataset with only eighty categories. Beyond a pure vision encoder, EVA can also serve as a vision-centric, multi-modal pivot to connect images and text. We find initializing the vision tower of a giant CLIP from EVA can greatly stabilize the training and outperform the training from scratch counterpart with much fewer samples and less compute, providing a new direction for scaling up and accelerating the costly training of multi-modal foundation models. To facilitate future research, we release all the code and models at https://github.com/baaivision/EVA.
translated by 谷歌翻译
目的:单个骨骼的本地化和细分是许多计划和导航应用程序中重要的预处理步骤。但是,如果手动完成,这是一项耗时和重复的任务。这不仅对于临床实践,而且对于获取培训数据都是正确的。因此,我们不仅提出了一种端到端学习的算法,该算法能够在上身CT中分割125个不同的骨骼,而且还提供了基于合奏的不确定性度量,有助于单张扫描以扩大训练数据集。方法我们使用受3D-UNET和完全监督培训启发的神经网络体系结构创建全自动的端到端学习细分。使用合奏和推理时间扩展改进结果。我们研究了合奏 - 不确定性与未标记的扫描的前瞻性用途,这是培训数据集的一部分。结果:我们的方法在16个上体CT扫描的内部数据集上进行评估,每个维度的分辨率为\ si {2} {\ milli \ meter}。考虑到我们标签集中的所有125个骨头,我们最成功的合奏中位数骰子得分系数为0.83。我们发现扫描的集合不确定性与其对扩大训练集中获得的准确性的前瞻性影响之间缺乏相关性。同时,我们表明集成不确定性与初始自动分割后需要手动校正的体素数量相关,从而最大程度地降低了最终确定新的地面真实分段所需的时间。结论:结合结合,集合不确定性低的扫描需要更少的注释时间,同时产生类似的未来DSC改进。因此,它们是扩大从CT扫描的上身不同骨分割的训练集的理想候选者。 }
translated by 谷歌翻译
语言是个人表达思想的方法。每种语言都有自己的字母和数字字符集。人们可以通过口头或书面交流相互交流。但是,每种语言都有同类语言。聋哑和/或静音的个人通过手语交流。孟加拉语还具有手语,称为BDSL。数据集是关于孟加拉手册图像的。该系列包含49个单独的孟加拉字母图像。 BDSL49是一个数据集,由29,490张具有49个标签的图像组成。在数据收集期间,已经记录了14个不同成年人的图像,每个人都有不同的背景和外观。在准备过程中,已经使用了几种策略来消除数据集中的噪声。该数据集可免费提供给研究人员。他们可以使用机器学习,计算机视觉和深度学习技术开发自动化系统。此外,该数据集使用了两个模型。第一个是用于检测,而第二个是用于识别。
translated by 谷歌翻译
当前气候的快速变化增加了改变能源生产和消费管理的紧迫性,以减少碳和其他绿色房屋的生产。在这种情况下,法国电力网络管理公司RTE(r {\'e} seau de Transport d'{\'e} lectricit {\'e})最近发布了一项广泛的研究结果,概述了明天法国法语的各种情况能源管理。我们提出一个挑战,将测试这种情况的可行性。目的是控制电力网络中的电力运输,同时追求多种目标:平衡生产和消费,最大程度地减少能量损失,并确保人员和设备安全,尤其是避免灾难性的失败。虽然应用程序的重要性本身提供了一个目标,但该挑战也旨在推动人工智能分支(AI)(AI)的最先进,称为强化学习(RL),该研究提供了解决控制问题的新可能性。特别是,在该应用领域中,深度学习和RL的组合组合的各个方面仍然需要利用。该挑战属于2019年开始的系列赛,名称为“学习运行电力网络”(L2RPN)。在这个新版本中,我们介绍了RTE提出的新的更现实的场景,以便到2050年到达碳中立性,将化石燃料电力产生,增加了可再生和核能的比例,并引入了电池。此外,我们使用最先进的加强学习算法提供基线来刺激未来的参与者。
translated by 谷歌翻译
产品匹配是全球对电子商务消费者行为的理解的基本步骤。实际上,产品匹配是指确定来自不同数据源(例如零售商)是否提供两个产品的任务。标准管道使用以前的阶段,称为阻止,其中给定产品提供了一组潜在的匹配候选者,以相似的特征(例如相同的品牌,类别,风味等)检索。从这些类似的候选产品中,那些不匹配的产品可以被视为艰难的负面因素。我们提出了Block-SCL,该策略使用阻止输出来充分利用监督的对比度学习(SCL)。具体而言,块-SCL使用在阻塞阶段获得的硬性样本来构建丰富的批处理。这些批次提供了一个强大的训练信号,导致该模型了解产品匹配的更有意义的句子嵌入。几个公共数据集中的实验结果表明,尽管仅将短产品标题作为输入,没有数据增强和更轻的变压器主链比竞争方法,但Block-SCL仍取得了最新的结果。
translated by 谷歌翻译
我们继续研究遗传算法(GA)在组合优化问题上,候选解决方案需要满足平衡性约束。已经观察到,临时交叉和突变操作员授予的搜索空间大小的减小通常不会转化为GA性能的实质性改善。尽管怀疑平衡的代表可能会产生更不规则的健身景观,但仍然没有明确的解释,尽管该景观可能会更难以使GA融合到全球最佳距离。在本文中,我们通过将局部搜索步骤添加到具有平衡运算符的GA,并使用它来进化高度非线性平衡的布尔功能,从而调查此问题。特别是,我们围绕两个研究问题组织了实验,即如果本地搜索(1)提高了GA的收敛速度,并且(2)降低了人口多样性。令人惊讶的是,尽管我们的结果肯定地回答了第一个问题,但他们还表明,添加本地搜索实际上\ emph {增加}人口中个人之间的多样性。我们将这些发现与有关布尔功能问题的健身景观分析的最新结果联系起来。
translated by 谷歌翻译
自然行为由不可预测的动力学组成,可以突然切换并在许多不同的时间尺度上展开。尽管在受约束或简化的基于任务的条件下构建行为的表示方面已经找到了一些成功,但由于它们假设单一的时间动力学规模,因此无法将其中许多模型应用于自由和自然主义的设置。在这项工作中,我们跨多个尺度(BAMS)引入引导程序,这是一种多尺度表示模型:我们结合了一个汇总模块,该模块汇总了与具有不同时间接收场的编码器上提取的特征,并设计了一组潜在目标,以进行引导程序各个空间中的表示,以鼓励不同时间尺度的分离。我们首先将我们的方法应用于在不同地形类型中导航的四倍的数据集上,并表明我们的模型捕获了行为的时间复杂性。然后,我们将我们的方法应用于MABE 2022多代理行为挑战,我们的模型在两个子任务中排名第三,第一个排名第1,并在分析行为时显示了合并多时间尺度的重要性。
translated by 谷歌翻译